home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
IRIX Installation Tools & Overlays 2002 November
/
SGI IRIX Installation Tools & Overlays 2002 November - Disc 2.iso
/
relnotes
/
pcp_eoe
/
ch3.z
/
ch3
Wrap
Text File
|
2002-10-15
|
28KB
|
707 lines
- 1 -
3. _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s
The major additions and changes for the basic services and
tools of the Performance Co-Pilot are described in the
following sections.
Refer to the reference pages of the individual utilities for
a complete description of any new functionality.
3.1 _I_n_f_r_a_s_t_r_u_c_t_u_r_e__C_h_a_n_g_e_s
The following changes have been made to the PCP
infrastructure that affect both collector and monitor
configurations.
3.1.1 _C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
The following incidents were resolved for IRIX 6.5.13.
817880 The rrrreeeemmmmoooovvvveeee command to ppppmmmmaaaaffffmmmm(1) was not listing all
of the files used by ppppmmmmllllooooggggggggeeeerrrr(1) for PCP archive
folios created with the ``record'' facility of the
GUI tools.
803341 The default ``replay'' tool for PCP archive folios
created by mmmmkkkkaaaaffff directly was changed from mmmmkkkkaaaaffff to
ppppmmmmcccchhhhaaaarrrrtttt. This makes the ppppmmmmaaaaffffmmmm rrrreeeeppppllllaaaayyyy function more
useful. The mmmmkkkkaaaaffff(1) man page was updated to be more
precise about the interactions between mmmmkkkkaaaaffff and
ppppmmmmaaaaffffmmmm.
The following incidents were resolved for IRIX 6.5.10.
794379 The routine responsible for parsing the PCP metrics
namespace (ppppmmmmLLLLooooaaaaddddNNNNaaaammmmeeeeSSSSppppaaaacccceeee(3)) incorrectly accepts
hyphens in metric names.
The following incidents were resolved for IRIX 6.5.8.
768814 Resolved some diskless install problems.
773035 The xxxxvvvvmmmm PMDA exports mirror revive state
information.
The following incidents were resolved for IRIX 6.5.6.
764463 A new xxxxvvvvmmmm PMDA was added to export performance
statistics from the _X_V_M volume manager.
- 2 -
3.1.2 _P_C_P__2_._1__t_o__P_C_P__2_._2
1. PCP 2.2 for both IRIX and Linux is now built from the
one source code base. While the list of features and
packaging may be different between the distributions,
you may see some evidence of minor changes that are a
result of unifying the product development process for
both platforms.
2. A standard set of environment variables are defined in
/_e_t_c/_p_c_p._c_o_n_f and described in ppppccccpppp....ccccoooonnnnffff(4). These
variables are generally used to specify the location
of various PCP pieces in the file system and may be
loaded into shell scripts by sourcing the /_e_t_c/_p_c_p._e_n_v
shell script (see ppppccccpppp....eeeennnnvvvv(4)) and queried by C/C++
programs using the ________ppppmmmmGGGGeeeettttCCCCoooonnnnffffiiiigggg(3) library function.
See the PPPPCCCCPPPPIIIInnnnttttrrrroooo(1) man page for further details.
3.1.3 _P_C_P__2_._0__t_o__P_C_P__2_._1
1. To help with PCP deployments on systems running
operating systems other than IRIX, the Performance
Metrics Name Space (PMNS) has been overhauled to
remove the iiiirrrriiiixxxx.... prefix from the names of the
system-centric performance metrics, e.g.
iiiirrrriiiixxxx....ddddiiiisssskkkk....ddddeeeevvvv....rrrreeeeaaaadddd____bbbbyyyytttteeeessss has become
ddddiiiisssskkkk....ddddeeeevvvv....rrrreeeeaaaadddd____bbbbyyyytttteeeessss. In addition to changing the
PMNS, translations are also handled dynamically in the
PCP libraries, so all clients will continue to operate
correctly using either the new or the old names. As a
consequence no configuration files will need to be
changed, and monitoring tools will work correctly in
environments with a mixture of new and old style PMNS
deployments.
2. The PCP inference engine ppppmmmmiiiieeee(1) has migrated from the
_p_c_p._s_w._m_o_n_i_t_o_r subsystem to the _p_c_p__e_o_e._s_w._e_o_e
subsystem, and the licensing restrictions have been
relaxed to allow ppppmmmmiiiieeee to be used to monitor
performance on the local host without any PCP
licenses.
3. Support for running ppppmmmmiiiieeee(1) as a daemon has been
added. This has many similarities to the ppppmmmmccccdddd(1) and
ppppmmmmllllooooggggggggeeeerrrr(1) daemon support - ppppmmmmiiiieeee can be controlled
through the cccchhhhkkkkccccoooonnnnffffiiiigggg(1) interface, and the startup
and shutdown script, ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppmmmmiiiieeee which supports
starting and stopping multiple ppppmmmmiiiieeee instances
monitoring one or more hosts. This is achieved with
the assistance of another script, ppppmmmmiiiieeee____cccchhhheeeecccckkkk(1) which
is similar to the ppppmmmmllllooooggggggggeeeerrrr support script
- 3 -
ppppmmmmllllooooggggggggeeeerrrr____cccchhhheeeecccckkkk(1).
4. New capabilities have been added to assist in the
estimation of PCP archive sizes. The ----rrrr option for
ppppmmmmllllooooggggggggeeeerrrr(1) causes the size of the physical record(s)
for each group of metrics and the expected
contribution of the group to the size of the PCP
archive for one full day of collection to be reported
in the log file. The ----ssss option to ppppmmmmdddduuuummmmpppplllloooogggg(1) will
report the size in bytes of each physical record in
the archive.
5. Changes to ppppmmmmllllooooggggggggeeeerrrr(1) have greatly reduced the size
of the *._m_e_t_a files created when logging metrics with
instance domains that change over time.
6. As an aid to creating ppppmmmmllllooooggggggggeeeerrrr configuration files,
ppppmmmmllllooooggggccccoooonnnnffff(1) is a new tool that allows selection of
groups of commonly desired metrics and customization
of ppppmmmmllllooooggggggggggggeeeerrrr configurations from a simple interactive
dialog.
3.2 _C_o_l_l_e_c_t_o_r__C_h_a_n_g_e_s
The following changes effect PMCD and the PMDAs that provide
the collection services.
3.2.1 _L_i_b_i_r_i_x_p_m_d_a__C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
The following incidents were resolved for IRIX 6.5.13.
616514 Export some bufview reported metrics.
815636 Fix network.interface.baudrate scale to match units.
822285 Export CXFS metrics if SGI_IS_OS_CELLULAR.
825330 Export event counter metrics for R12K & R14K.
826783 Export stats for meta and repeater routers on SGI
Origin 3000 Series systems.
The following incidents were resolved for IRIX 6.5.12.
814585 Export vnode freelist metrics.
The following incidents were resolved for IRIX 6.5.11.
789419 Export tpsc metrics.
- 4 -
807502 Export gfxinfo metrics.
807799 Fix handling of 1394 disk names.
The following incidents were resolved for IRIX 6.5.10.
794983 Fix hinv.cputype for RM5271 and RM7000 cpus.
785163 Export metric for kernel memory per node.
The following incidents were resolved for IRIX 6.5.9.
790121 Provide support for SN1.
No significant _l_i_b_i_r_i_x_p_m_d_a changes were made for IRIX 6.5.7
or 6.5.8.
The following incidents were resolved for IRIX 6.5.6.
764170 Provide support for Fiber Channel disks.
The following incidents were resolved for IRIX 6.5.5.
649767 Export metrics for streams data, which are also
exported by nnnneeeettttssssttttaaaatttt ----mmmm.
682896 The semantics of the metrics of
xxxxbbbboooowwww....{ppppoooorrrrtttt|ttttoooottttaaaallll}....{ssssrrrrcccc|ddddsssstttt} have changed from
reporting transfer of bytes to transfer of
micropackets as it is impossible to tell how many
bytes of data are really transferred.
The following incidents were resolved for IRIX 6.5.4.
675673 Export some additional xfs inode cluster metrics.
The following incidents were resolved for IRIX 6.5.3.
628012 Export wait I/O metrics.
The following incidents were resolved for IRIX 6.5.2.
558773 Export metrics for the instantaneous disk queue
length and for the running sum of the disk queue
lengths.
The following incidents were resolved for IRIX 6.5.1.
588158 A section called "Enabling of Statistics Collection"
has been added to the lllliiiibbbbiiiirrrriiiixxxxppppmmmmddddaaaa(5) man page.
- 5 -
603178 Extra diagnostic messages were added to log the
state changes of turning the xlv statistics
gathering on or off.
3.2.2 _O_t_h_e_r__C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
The following incidents were resolved for IRIX 6.5.13.
814533 Support added for instrumentation of activity from
the kernel's cluster infrastructure heartbeat
services, as used by CXFS and FailSafe.
820896 A new mmmmmmmmvvvv PMDA was added to support light-weight
export of performance data from system daemons.
822509 Export activity statistics from the cluster
infrastructure ffffssss2222dddd(1) daemon using the mmmmmmmmvvvv PMDA.
823395 Export activity statistics from the cluster
infrastructure daemons ccccllllccccoooonnnnffffdddd(1), ccccrrrrssssdddd(1), ccccaaaadddd(1)
and the ccccaaaadddd plugins.
The following incidents were resolved for IRIX 6.5.12.
813494 A logic error in handling error returns from some
system calls caused the xxxxvvvvmmmm PMDA to fail to
correctly enumerate the instance domain of XVM
volume elements and physical volumes.
The following incidents were resolved for IRIX 6.5.11.
801248 Due to overflow in intermediate results, some of the
memory metrics from the pppprrrroooocccc PMDA were susceptible
to premature overflow.
The following incidents were resolved for IRIX 6.5.10.
782226 Add ``job id'' to pppprrrroooocccc PMDA.
3.2.3 _P_C_P__2_._1__t_o__P_C_P__2_._2
1. The interface provided by _l_i_b_p_c_p__p_m_d_a between PMCD and
a PMDA has been extended to support a new protocol
(PPPPMMMMDDDDAAAA____IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE____3333), with the following semantics for
the return codes from ppppmmmmddddaaaaFFFFeeeettttcccchhhh(3) callbacks:
- 6 -
_______________________________________________________________________________
Interface Return Value Meaning
_______________________________________________________________________________
PMDA_INTERFACE_1
or
PMDA_INTERFACE_2
Value is an error code (e.g. PM_ERR_PMID,
PM_ERR_INST or PM_ERR_AGAIN)
< 0
>= 0 Success
_______________________________________________________________________________
Value is an error code (e.g. PM_ERR_PMID,
PM_ERR_INST or PM_ERR_AGAIN)
PMDA_INTERFACE_3 < 0
0 The metric value is not currently available
> 0 Success
_______________________________________________________________________________
|||||||||||
|||||||||||
|||||||||||
|||||||||||
These changes allow more detail to be passed back from
the PMDA to the clients via PMCD in the cases where
metric values are legitimately not currently available
(as opposed to some error condition preventing the
metric value from being fetched).
3.2.4 _P_C_P__2_._0__t_o__P_C_P__2_._1
1. The pppprrrroooocccc agent has been changed to use
/_p_r_o_c/_p_i_n_f_o/_x_x_x_x if possible and only use /_p_r_o_c/_x_x_x_x
if there is no alternative. Previously this agent
always used /_p_r_o_c/_x_x_x_x to extract process information,
and this caused unnecessary access checking to take
place and some NFS contention problems were reported
as a result.
2. The new eeeessssppppppppiiiinnnngggg PMDA provides quality of service
metrics for consumption by the Embedded Support
Partner (ESP) infrastructure (released in IRIX 6.5.5).
This PMDA can be used in conjunction with ppppmmmmiiiieeee(1)
rules generated by the new ppppmmmmiiiieeeeccccoooonnnnffff(1) tool to detect
service failure on either local or remote hosts.
Among the services which can be probed are ICMP, SMTP,
NNTP, ppppmmmmccccdddd, and local HIPPI interfaces using the new
hhhhiiiipppppppprrrroooobbbbeeee(1) utility.
3. Changes to the way ppppmmmmccccdddd(1) and ppppmmmmllllooooggggggggeeeerrrr(1) are started
from ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp.
a. When ppppmmmmllllooooggggggggeeeerrrr is chkconfig'd oooonnnn, ppppmmmmllllooooggggggggeeeerrrr
instances are launched in the background from
////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp ssssttttaaaarrrrtttt, as this helps faster
system reboots. In some cases this results in
diagnostics from ppppmmmmllllooooggggggggeeeerrrr and/or
////uuuussssrrrr////ppppccccpppp////bbbbiiiinnnn////ppppmmmmllllooooggggggggeeeerrrr____cccchhhheeeecccckkkk that previously
appeared when ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp was run to now be
generated asynchronously - any such messages are
forwarded to the rrrrooooooootttt user as e-mail. These
messages are in addition to those already
- 7 -
written to /_v_a_r/_a_d_m/_p_c_p/_N_O_T_I_C_E_S by ppppmmmmppppoooosssstttt(1)
from ////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppccccpppp.
b. A new utility, ppppmmmmccccdddd____wwwwaaaaiiiitttt(1), provides a more
reliable mechanism for detecting that ppppmmmmccccdddd is
ready to accept client connections.
4. In concert with changes to ppppmmmmiiiieeee, the ppppmmmmccccdddd PMDA has
been extended to export information about executing
ppppmmmmiiiieeee instances and their progress in terms of rule
evaluations and action execution rates. Refer to the
ppppmmmmccccdddd....ppppmmmmiiiieeee....**** metrics.
3.3 _M_o_n_i_t_o_r__C_h_a_n_g_e_s
The major additions and changes for the performance
visualization and analysis tools are described below.
3.3.1 _C_h_a_n_g_e_s__f_o_r__I_R_I_X__6_._5
The following incidents were resolved for IRIX 6.5.13.
807561 The semantics for the ----iiii and ----IIII options to
ppppmmmmpppprrrroooobbbbeeee(1) were changed to allow all instances (not
just the ones found at the next ppppmmmmFFFFeeeettttcccchhhh(3)) to be
reported.
814452 Timestamps have been added to the ppppmmmmiiiieeee(1) output
when the ----vvvv option is used.
823023 Added the new ttttooooppppiiiioooo(1) tool to measure process-level
demand for I/O bandwidth.
The following incidents were resolved for IRIX 6.5.12.
815387 The list of instances reported by ppppmmmmvvvvaaaallll(1) was not
being sorted, and this caused some confusion for
metrics with an underlying instance domain that
changed over time.
The following incidents were resolved for IRIX 6.5.11.
803336 Better creation of ppppmmmmaaaaffffmmmm(1) archive folios from the
``record'' mode of oooovvvviiiieeeewwww(1) for both SGI Origin 2000
and Origin 3000 Series systems.
The following incidents were resolved for IRIX 6.5.9.
790122 Add support for SGI Origin 3000 Series systems in
oooovvvviiiieeeewwww(1).
- 8 -
The following incidents were resolved for IRIX 6.5.8.
776214 Better handling of error return codes for _t_e_l_n_e_t
commands used by the eeeessssppppppppiiiinnnngggg and sssshhhhppppiiiinnnngggg PMDAs.
781065 The generic ppppmmmmiiiieeee rules supported by ppppmmmmiiiieeeeccccoooonnnnffff have
been extended to allow alarm notification to be
passed to EnlightenDSM.
3.3.2 _P_C_P__2_._1__t_o__P_C_P__2_._2
1. Support for the SGI Origin 3000 Series servers has
been added with new visualisation features specific
for these servers, and a complete re-write of the
oooovvvviiiieeeewwww(1) monitoring application.
3.3.3 _P_C_P__2_._0__t_o__P_C_P__2_._1
1. ppppmmmmiiiieeee
a. A syntactic restriction in the specification
language has been relaxed, and actions may now
have an arbitrary number of quoted arguments
(previously at most two arguments were allowed).
At the same time a problem with the ssssyyyysssslllloooogggg
action was resolved, allowing the ----pppp option to
be passed to llllooooggggggggeeeerrrr(1). For example, this is
now valid:
some_inst (
(100 * filesys.used / filesys.capacity) > 98 )
-> syslog "-p daemon.info 'file system close to full"
" %h:[%i] %v% " "'";
b. Metrics with dynamic instance domains are now
correctly handled by ppppmmmmiiiieeee. Previously ppppmmmmiiiieeee
instantiated the instance domain when it
started, and was oblivious to any subsequent
changes in the instance domain. This is most
useful for rules using the metrics of the
hhhhoooottttpppprrrroooocccc PMDA that is part of the ppppccccpppp product.
c. The ppppmmmmiiiieeee language has been extended to allow two
new operators mmmmaaaattttcccchhhh____iiiinnnnsssstttt and nnnnoooommmmaaaattttcccchhhh____iiiinnnnsssstttt that
take a regular expression and a boolean
expression. The result is the boolean AND of
the expression and the result of matching (or
not matching) the associated instance name
against the regular expression.
- 9 -
For example, this rule evaluates error rates on
various 10BaseT Ethernet network interfaces
(e.g. ecN, etN or efN):
some_inst
match_inst "^(ec|et|ef)"
network.interface.total.errors > 10 count/sec
-> syslog "Ethernet errors:" " %i";
The following rule evaluates available free
space for all filesystems except the root
filesystem:
some_inst
nomatch_inst "/dev/root"
filesys.free < 10 Mbytes
-> print "Low filesystem free (Mb):" " [%i]:%v";
d. During rule evaluation, ppppmmmmiiiieeee keeps track of the
expected number of rule evaluations, number of
rules actually evaluated, the number of
predicates that are true and false, the number
of actions executed, etc. These statistics are
maintained as binary data structures in the
mmmmmmmmaaaapppp'ed files /_v_a_r/_t_m_p/_p_m_i_e/<_p_i_d>. If ppppmmmmiiiieeee is
running on a system with a PCP collector
deployment, the ppppmmmmccccdddd PMDA exports these metrics
via the new ppppmmmmccccdddd....ppppmmmmiiiieeee....**** group of metrics.
e. Some restrictions on the expansion of macros
(e.g. $name) have been removed, so macro
expansion can occur anywhere in the ppppmmmmiiiieeee rule
specifications.
f. There has been some changes to improve the
formatting of numeric values reported with the
options ----vvvv, ----VVVV and ----WWWW, and for the expansion of
%%%%vvvv in actions. In general terms these have
removed extra white space and reduced the
likelihood of scientific notation being used.
2. A set of parameterized ppppmmmmiiiieeee rules have been developed
which are applicable to most systems and will allow
ppppmmmmiiiieeee to be used by new users without knowledge of the
ppppmmmmiiiieeee syntax. A new utility, ppppmmmmiiiieeeeccccoooonnnnffff(1) has been
built which allows these rules to be enabled or
disabled, or the parameters and thresholds adjusted
for a specific system.
The combination of ppppmmmmiiiieeee, ppppmmmmiiiieeeeccccoooonnnnffff, ppppmmmmiiiieeee____cccchhhheeeecccckkkk and
////eeeettttcccc////iiiinnnniiiitttt....dddd////ppppmmmmiiiieeee provides the infrastructure required
for PCP to search for behavior indicative of
performance problems in a fully automated manner with
little or no local customization required. Where
- 10 -
customization is needed, ppppmmmmiiiieeeeccccoooonnnnffff(1) provides a
convenient way of doing this.
3. A new utility, hhhhiiiipppppppprrrroooobbbbeeee(1), has been added which will
check the status of HIPPI interfaces on a system.
More sophisticated monitoring of HIPPI interfaces will
be supported with a hhhhiiiippppppppiiii PMDA that will be released
as part of the forthcoming PCP for HPC add-on product.
3.4 _F_e_a_t_u_r_e_s__R_e_m_o_v_e_d__o_r__D_e_p_r_e_c_a_t_e_d
In PCP 2.2, the following features from earlier PCP versions
have been removed or are deprecated.
None.